% please textwrap! It helps svn not have conflicts across a multitude of
% lines.
%
% vim:set textwidth=78:

\section{Introduction}
In this report, we examine techniques to identify rare examples in large data
sets. As Elkan explains in \cite{class_notes}, the interesting examples tend to
be the rare ones like detecting credit card fraud. Consequently, we focus on
identifying whether a direct mailing will result in a donation using the KDD
Cup 1998 learning data \cite{kdd_cup_data} where only about $5\%$ of the
examples positive. Identifying these examples are interesting because they
might bring insight in how to improve returns for the future. Similarly,
building models to identify these rare events helps produce better estimates
to guide decisions for the future.

To illustrate these techniques, we will build models using linear support vector
machine (SVM) and Naive Bayes learner to accurately predict whether direct
mailing results in a response.

The remainder of this report is organized as follows. Section \ref{sec:flow}
explains the optimal flow we found to preprocess the data and create a model.
Section \ref{sec:methodology} explains the techniques used to prune and
preprocess data. Section \ref{sec:results} presents our results from these
techniques and our evaluation. Section \ref{sec:conclusion} concludes. 
